Comparing MUCK-II and MUC-3: assessing the difficulty of different tasks

نویسنده

  • Lynette Hirschman
چکیده

The natural language community has made impressive progress in evaluation over the last four years . However, as the evaluations become more sophisticated and more ambitious, a fundamental proble m emerges: how to compare results across changing evaluation paradigms . When we change domain , task, and scoring procedures, as has been the case from MUCK-I to MUCK-II to MUC-3, we los e comparability of results . This makes it difficult to determine whether the field has made progress sinc e the last evaluation . Part of the success of the MUC conferences has been due to the incremental approach taken to system evaluation . Over the four year period of the three conferences, the domain has becom e more " realistic", the task has become more ambitious and specified in much greater detail, and th e scoring procedures have evolved to provide a largely automated scoring mechanism . This process has been critical to demonstrating the utility of the overall evaluation process . However we still need som e way to assess overall progress of the field, and thus we need to compare results and task difficulty o f MUC-3 relative to MUCK-II .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Muck-ii and Muc-3: Assessing the Difficulty of Different Tasks Overview

The natural language community has made impressive progress in evaluation over the last four years. However, as the evaluations become more sophisticated and more ambitious, a fundamental problem emerges: how to compare results across changing evaluation paradigms. When we change domain, task, and scoring procedures, as has been the case from MUCK-I to MUCK-II to MUC-3, we lose comparability of...

متن کامل

GE: description of the NLTooLSET system as used for MUC-3

The GE NLTooLsET aims at extracting and deriving useful information from text using a knowledge-based , domain-independent core of text processing tools, and customizing the existing programs to each new task . The program achieves this transportability by using a core knowledge base and lexicon that adapts easil y to new applications, along with a flexible text processing strategy that is tole...

متن کامل

Intra-Individual and Inter-Levels of Metacognition across EFL Writing Tasks of Multi Difficulty Levels

This study investigated the quality of metacognition at its inter-individual level, i.e., socially-shared metacognition, across two collaborative writing tasks of different difficulty levels among a cohort of Iranian EFL learners.  Moreover, it examined the correlation between the individual and the social modes of metacognition in writing.  The analysis of think-aloud protocols of a number of ...

متن کامل

New York University: description of the PROTEUS system as used for MUC-4

The PROTEUS Syntactic Analyzer was developed starting in the fall of 1984 as a common base for all the applications of the PROTEUS Project. Many aspects of its design reflect its heritage in the Linguistic Strin g Parser, previously developed and still in use at New York University . The current system, including the Restriction Language compiler, the lexical analyzer, and the parser proper, co...

متن کامل

The Impact of Rating Methods and Task Types on EFL Learners' Writing Scores

The difficulty of assessing the writing skill is well known. Different testing facets seem to affect the result of assessing the writing skill. In addition to the writer’s ability, the topic of the writing task, and methods of rating may contribute to the writer’s score. In this study, 50 EFL learners wrote four different types of writing tasks (convincing, describing, instructing, and explaini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1991